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ABSTRACT 

Stellar oscillations can provide a wealth of information about a star, which can be extracted from 
observed time series of the star's brightness or radial velocity. In this paper we address the question 
of how to extract as much information as possible from such a dataset. We have developed a Markov 
Chain Monte Carlo (MCMC) code that is able to infer the number of oscillation frequencies present 
in the signal and their values (with corresponding uncertainties) , without having to fit the amplitudes 
and phases. Gaps in the data do not have any serious consequences for this method; in cases where 
severe aliasing exists, any ambiguity in the frequency determinations will be reflected in the results. 
It also allows us to infer parameters of the frequency pattern, such as the large separation Av. We 
have previously applied this method to the star v Indi (Bedding et al. 2006), and here we describe the 
method fully and apply it to simulated datasets, showing that the code is able to give correct results 
even when some of the model assumptions are violated. In particular, the non-sinusoidal nature of 
the individual oscillation modes due to stochastic excitation and damping has no major impact on 
the usefulness of our approach. 

Subject headings: stars: oscillations — methods: data analysis — methods: statistical 
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1. INTRODUCTION 

Asteroseismology is fast becoming a standard tool in 
stellar astrophysics (e.g. Kurtz 2005). The properties of 
a star's oscillations are determined by the star's structure 
and composition, so measurements of those oscillations 
have enabled astronomers to probe the interiors of stars. 
In particular, the frequencies of the various modes can 
provide information about the speed of sound at various 
depths in the star, and hence constrain its composition. 
Recently, there has been a rapid increase in the amount 
of data available for studying solar-like stars in this way 
(for reviews, see Bedding & Kjeldsen 2003; Bouchy & 
Carrier 2003; Kjeldsen & Bedding 2004). 

The data analysis procedures used for asteroseismol- 
ogy have tended to be based on Fourier methods. For 
example, the first step taken by most authors analysing 
the raw time series data is usually the computation of 
the power spectrum, or periodogram. However, it has 
been shown (Bretthorst 1988) that the periodogram is 
only an optimal procedure if we are estimating the fre- 
quency of the oscillations with a model that assumes that 
only one frequency is present, or, if there are many fre- 
quencies, that they are well separated. In the case of 
solar-like oscillations, we expect the star to be oscillat- 
ing in many modes simultaneously, some of which may 
be closely spaced. Furthermore, the modes are damped 
and stochastically excited, so each mode cannot be repre- 
sented by a single sinusoid. There is also a guiding theory 
telling us that the frequencies of the modes are not arbi- 
trary, but form a well defined pattern. This constitutes 
additional information that we might wish to take into 
account when analysing observations. 

The desire to extract as much information as possi- 
ble from an astronomical dataset, and to combine this 
with prior information from other datasets or theoreti- 
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cal considerations, has led to an explosion in the use of 
Bayesian methods (e.g. Gregory 2005; Brewer & Lewis 
2006; Parkinson, Mukherjee & Liddle 2006). The basic 
format of Bayesian inference is as follows. Our state of 
knowledge about any set of parameters 9 is described 
by a probability distribution over the parameter space. 
With prior information and assumptions /, a prior dis- 
tribution p(9\I) (read as the distribution for 9 given I) is 
assigned to describe what we know about the parameters 
before taking into account the current dataset. Then, 
the effect of taking into account data D is to modify our 
state of knowledge from the prior distribution p(9\I) to 
the posterior distribution p(9\T) 1 I) 1 given by 



p(9\T>,I)<xp(9\I)p(T>\9,I). 



fl 



The distribution p(D\0,I) is the probability distribu- 
tion for the data given the parameters. When the data 
are fixed, this is a function of the parameters and is called 
the likelihood function. Hence, whether a model (a point 
in the parameter space) becomes plausible after taking 
into account data D depends on how plausible the model 
was before taking into account the data and on how well 
the model predicts the observed data. This kind of rea- 
soning has led to the development of new methods of 
data analysis, as well as new interpretations and under- 
standing of familiar methods 1 . An example relevant to 
spectral analysis is the finding of Bretthorst (1988) that 
the periodogram is proportional to the log of the poste- 
rior probability density for the frequency of an oscillat- 
ing signal, under certain assumptions. For more details 
on Bayesian data analysis, including its relationship to 
common Fourier techniques, see the textbook by Gregory 
(2005). 

1 For example, a least squares fit of a model to some data can be 
interpreted as finding the most probable set of parameters, with 
a Gaussian distribution for the data points and no strong prior 
information. 
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Typically, the datasets that are used for asteroseis- 
mology are time series of cither the brightness (pho- 
tometry) or the radial velocity of the stellar surface 
(from spectroscopy), measured at a discrete set of times 
{h,t2, ...,fjv}, and with each data point subject to an 
observational error {ei, e2, ejv}. This results in the 
observed values for the radial velocity (or magnitude) y 
at each of the N times: 



Vi = /(*») + ei 



(2) 



where /(f) is the true form of the signal, and the {e^} are 
the observational errors. If the star is oscillating in M 
different (pure) modes, with frequencies {y\, 1/2, vm}, 
then the functional form of the signal is 



/(*) 



M 

£ 



(AjCos(2TTVjt) + Bjsin(2irvjt)) (3) 



The goal of the data analysis is usually to use the ob- 
served data y = {yi, 1/2, j/at} to infer the frequencies 
v = {1/1, V2, vm}, the number of frequencies M and 
possibly their amplitudes. However, the values of the 
frequencies are not arbitrary. For low degree p-mode os- 
cillations, the frequencies are given approximately by 

!/„,; = Au(n + -I + e) - 1(1 + 1)D (4) 

The values of the parameters of this pattern, Au, e and 
D 0} are also of interest as they are related to the sound 
speed at various depths in the star (Brown et al. 1994). 
Throughout this paper, we will denote any parameters 
of the frequency pattern collectively by 9, and concen- 
trate on finding these parameters and the frequencies. 
The amplitudes {Ai} 7 {Bi} are regarded as nuisance pa- 
rameters. In a recent paper (Bedding et al. 2006), wc 
used a Bayesian method to estimate the large separation 
Av of the metal-poor subgiant v Indi. The purpose of 
this paper is to describe the method in greater detail and 
demonstrate its effectiveness on simulated data. 

2. METHOD 

2.1. Derivation 

As with all Bayesian calculations, the first step is to 
write down Bayes' theorem (Equation 1) for the joint 
probability distribution for all of the unknown parame- 
ters (the frequencies f, the corresponding sine and cosine 
amplitudes A and B, and the frequency pattern param- 
eters 9), given the data y and the prior information or 
assumptions /: 

p(M, v, A, B, 0|y, /) cx p(M, u, A, B, 0\I)p(y\M, u, A, B, 

(5) 

where p(M, v, A, B, 9\I) is the prior probability dis- 
tribution for all of the unknown parameters, and 
p(y\M, v, A, B, 9, 1) is the likelihood function, or the 
probability density for obtaining the observed data, given 
the parameters. From equations 2 and 3, it is clear that 
knowing the 9 parameters is irrelevant for predicting the 
observed data, if we already knew the frequencies and 
amplitudes. Therefore the explicit dependence on 9 can 
be dropped from the likelihood function: 

p(y\M, u, A, B, 9, 1) = p(y\M, v, A, B, /) (6) 



Expanding the prior probability density with the product 
rule, we have 

p(M, v, A, B, 0|y, /) cx p(M, 9\I)p(u, A,B\M, 9, 1) 

xp(y|M,i/,A,B,J)(7) 
cx p(M\I)p(0\I)p(v\M, 6, 1)p(A\M, I)p(B\M, I) 

xp(y|M,i/,A,B,J)(8) 

where we have made the assumptions that the prior prob- 
ability densities for all of the parameters are independent 
(e.g. knowing a frequency would not tell us anything 
about the amplitude), and that knowing the 9 parame- 
ters would tell us something about the frequencies, but 
nothing about the amplitudes. 

Since we are not interested in the amplitudes, we can 
integrate out the A and B variables, to obtain the pos- 
terior distribution for M, v and 8 alone: 

p(M, v, 0|y, I) cx p(M\I)p(d\I)p(v\M, 0, 1) 
x J p(A\M 7 I)p(B\M 7 I)p(y\M,v,A 7 B,I)d M Ad M I$9) 

If the observational errors {ei} are independent and have 
a Gaussian distribution with standard deviations {di} 
(the cr's are usually given with the dataset, in the form 
of an estimated uncertainty for each data point), then 
the likelihood function is 

p(y\M,is,A,B,I)<xe 2 ^*=A > (10) 

We chose the prior distribution for the amplitudes 
{Ai,Bi} to be independent Gaussians with mean zero 
and standard deviation S (in applications, we chose 5 = 
3 ms- 1 ): 



M 1 A 

p{A\M,I) = Y[ —=e-^ )2 cx,r M e-^£^ti) 

M 

B \M,I) = T]^—e-^ 2 cx S- M e-^f=^h) 



With these choices, the integral in equation 9 is a Gaus- 
sian integral which can be evaluated analytically (Bret- 
thorst 1988). We will write the final result simply as 
L(M,v), since the result of evaluating the integral now 
plays the role of the likelihood function in a simpler infer- 
ence problem in which the amplitudes no longer appear: 

p(M, v, e\y, I) cx p{M\I)p{6\I)p{u\M, 6, 1)L(M, v) 

9 ,1) (13) 
For the implementation of our code (described in sec- 
tion 2.2), a function was written that can calculate the 
likelihood L for a given set of frequencies v. Another sim- 
plification that we will make is the assumption that the 
prior probability distributions for the frequencies, given 
the 9 parameters, are all independent, so that knowing 
one frequency would not provide any information about 
the other frequencies, if we knew the 9 parameters: 

M M 

p(u\M,9,I) = Y\p{ Vl \M,9,I) = Y\g{y l ;9) (14) 

3=1 j=i 
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TABLE 1 

Proposal transitions and their acceptance probabilities 



Proposal Transition 



Acceptance Probability 



Add a frequency 

(chosen from the prior g(u; 0)) min[l, ] 



Remove a frequency 



Shift a frequency 



minfl, iafia-l 

L old ' 



(symmetric proposal distribution) min[l, — — ^- — 



oidsK^oid; 



Shift the parameters 

(symmetric proposal distribution) min[l, 



P(gncw|/mlll9(''i;flncw) 1 

pt^oidi^n^ist^^oid) J 



where the form of the function g will be specified in sec- 
tion 4; it is essentially the prior for the value of a single 
frequency, given the parameters. 

2.2. Markov Chain Monte Carlo 

Since we are interested in what the data have to say 
about the 8 parameters, it would be useful if we could cal- 
culate the marginal distribution p(6\y, I). However, this 
is not achievable in practice. A more realistic approach is 
to use a Markov Chain Monte Carlo (MCMC) algorithm 
(Gregory 2005) to generate samples of models from the 
posterior distribution in Equation 13. Then, by looking 
only at the values in the sample, we have effectively 
marginalised over all of the frequencies. In our code, 
we used a version of the Metropolis-Hastings algorithm 
(Neal 1993). Starting from a model with zero frequencies 
(M = 0, is = 0) and an arbitrary starting point for the 6 
parameters, we updated the model by proposing a small 
change, and then accepting or rejecting with a certain 
acceptance probability. The acceptance probabilities for 
the different kinds of proposed changes were chosen so 
that the stationary distribution of the Markov Chain is 
the same as the target distribution of interest (the poste- 
rior distribution in equation 13). These acceptance prob- 
abilities are shown in Table 1. If a proposed change to 
the model is rejected, the next model in the sequence is 
the same as the previous one. When this algorithm runs, 
the output of the code is a random sequence of models, 
each possibly slightly different from the last, where the 
diversity amongst the models is indicative of the uncer- 
tainty of any inference. To save memory, a subset of 
effectively independent models from this sequence may 
be used for any subsequent calculations. This is called 
"thinning" . Our use of MCMC methods for detecting si- 
nusoidal signals in a noisy time series is similar to those 
demonstrated by Bretthorst (2003), and also Umstatter 
et al. (2005), in different contexts. 

2.3. Proposal Distributions 

As with all MCMC methods which use the Metropolis- 
Hastings algorithm, suitable probability distributions for 
the proposed transitions in Table 1 must be chosen, par- 
ticularly for the transitions that shift a frequency or shift 
the 6 parameters. It is common to use a Gaussian or 



Normal distribution centred at the current value to pro- 
pose the new value of a parameter, and the width of 
the proposal distribution is chosen to achieve a moder- 
ate acceptance rate of about 20-50 per cent. However, 
finding an appropriate width for the proposal distribu- 
tion for each parameter is a time-consuming process. To 
avoid this, we simply chose all of the proposal distribu- 
tions to be mixtures of 3 Gaussian distributions centred 
at the current value, with standard deviations covering 
several different orders of magnitude. Then, the "best" 
proposal distribution width is at least being used some 
of the time. If further optimization is required, many 
interesting tricks exist, for an example, see Neal (2005). 

3. SIMULATED DATA 

Before applying the algorithm to actual data, we ap- 
plied it to simulations for which we knew the true values 
of the parameters. We aimed to see whether the algo- 
rithm successfully recovered the true values and whether 
the uncertainties returned by the program were reliable. 

In order to keep the problem as simple as possible for 
the initial tests, we used a simplified version of the fre- 
quency pattern. We assumed that the frequencies were 
evenly spaced with spacing ^Av, starting from a central 
frequency f centre, such that the frequencies were 

Vi = Centre + « — , i = -5, -4, 5 (15) 

As it is easier to infer a small number of parameters than 
a large number, we assumed that ^center was equal to the 
frequency at which the power spectrum peaks, reducing 
the problem to that of estimating a single parameter, the 
spacing \&.v. 

The time series was a sum of 17 sinusoids, with fre- 
quencies ranging from 200 /xHz to 600 /zHz, in steps of 
25 ^iHz. The phases were chosen at random, and the 
amplitudes were chosen at random from a normal distri- 
bution with a mean of zero and a standard deviation 
of 1 ms -1 . Gaps were introduced in the time series, 
since these are common in most astronomical time se- 
ries datasets, due to the fact that observing can usually 
only be done at night, and interruptions may also occur 
due to poor weather 2 . Finally, random Gaussian noise 
with a standard deviation of 2 ms _1 was added to the 
data. This resulted in the time series shown in Figure 1. 
The power spectrum is also shown, which has peaks at 
the input frequencies, but other alias peaks also exist. 

Note that our model for the frequency pattern (Equa- 
tion 15) specifies that 11 frequencies are present, whereas 
our generated data set actually contains 17. This was 
done because we are unsure of the extent to which the 
frequency pattern holds, and to demonstrate that esti- 
mation of the frequency separation is robust to these as- 
sumptions; the only disadvantage of doing this is a slight 
decrease in the accuracy of the estimate of Ai/ since we 
are only taking 11 out of the 17 frequencies into account. 

4. CHOICE OF THE PRIOR G{y; 0) 

In this section, we consider the question of how to 
choose the prior for the value of a frequency, given the 

2 In fact, the observation times we used for the simulations were 
exactly the same as the actual observation times for v Ind reported 
by Bedding et al. (2006). There are 1201 data points in the time 
series. 
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Fig. 2. — An example of the prior g(v\6). There are three 
components: the low frequency exponential, the flat component, 
and the peaks where the putative relation of Equation 15 predicts 
that a frequency should be. 



Fig. 1. — The simulated dataset used for testing our method. 
The top panel shows the time series data (the error bars, of size 2 
ms _1 , have not been plotted). The bottom panel shows the power 
spectrum of the time series, with several peaks visible at the input 
oscillation frequencies (shown with dotted lines). 



6 parameters, in other words the function g(v;6). To 
choose an appropriate function, we must answer the 
question "if we knew the values of the parameters, 
what would this tell us about the value of a frequency" ? 
Naively, we would expect the frequencies to fall exactly 
as predicted by equation 4. In this case, our prior would 
be a sum of delta functions at the predicted values of 
the frequencies. However, we do not expect the relation 
to hold exactly, as it is an approximation. There may 
be slight deviations from it, and there may also be oscil- 
lation frequencies present which do not match the pre- 
dicted frequency pattern at all, especially in an evolved 
star that exhibits mode bumping. Hence, we will need to 
use a slightly "smeared out" version of the prior, in which 
the delta functions are replaced by Gaussians, or some 
other suitable function. A uniform component was also 
included to catch any extraneous frequencies that do not 
fit in to our expectations at all. Also, we chose to include 
an additional exponential component, so low frequency 
noise (such as long term trends in the data) could be 
modelled. The final choice of the prior distribution was 



9( ,^ ) = ^/A + _L 



N 



i mx ^min) 
-AC0) 



3N ^ aV2Tr 

2—1 V 



(16) 
(17) 



where A is the scale length of the low frequency expo- 
nential (set to A=50 /iHz), {v m i ni v max ) is the frequency 
range (we used a range from to 700 ^Hz), and a is 
a tolerance parameter, specifying how accurately we ex- 
pect the predicted frequency pattern to hold; it sets the 
width of the Gaussian peaks. Note that this a is an 
additional variable and is not related to the noise ct's 



in the likelihood function (Equation 10). Since this is 
initially unknown, we included a as one of the param- 
eters which we will infer, and assigned to it a uniform 
prior density over the positive real numbers. An exam- 
ple of the prior distribution g(v; 6) for a typical value of 
the spacing and a is plotted in Figure 2. It should be 
pointed out that our choice of the prior g(y; 6) has been 
rather ad hoc. There is no reason why the three compo- 
nents should have equal weight (i.e. each component of 
the prior contains 1/3 of the probability), and a possible 
improvement to the method would involve allowing the 
weights to become free parameters as well, to be inferred 
from the data, as we have done with the a parameter 3 . 
Despite these assumptions, the code gives reasonable and 
useful results, which will be presented in Section 5. 

4.1. Parallel Tempering 

The MCMC code as described above suffers from a se- 
rious problem. Early on in the run, frequencies tend to be 
added to the model in the locations that the initial values 
of the 8 parameters predict that they should be. Then, 
once these frequencies become established, the 6 values 
are unlikely to be changed by a large amount. In other 
words, the program can become stuck in a local mini- 
mum in the paramater space. Luckily, several techniques 
exist for overcoming these kinds of problems. Parallel 
tempering (Gregory 2005) is one such method. Usually, 
several MCMC runs are run simultaneously, each with 
a progressively 'softer' version of the likelihood function, 
achieved by raising the likelihood to some power that is 
less than 1 (the reciprocal of this power is called the tem- 
perature, and a higher temperature allows the model to 
move around more freely). Then, some proposed moves 
swap the models between chains with different temper- 
atures. However, in this application, the problem does 
not arise because the likelihood function is multimodal, 
but rather because the prior g(y\ 6) is sharply peaked. 
Therefore, rather than softening the likelihood function 

3 Preliminary tests indicate that this approach is viable and does 
not significantly increase the amount of CPU time required. 
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for the high temperature chains, we softened the prior 
for the frequencies. 

Define a tempered probability distribution <7t($, M, v) 
(with temperature T) by 



/ M 

q T (0,M,»)<xp(O\I)p(M\I) \T{Z(T)g( Vi ;0) l l T I L{M.u) 

which is the same as the posterior (Equation 13) but with 
a flattened version of the prior (Z(T) is the normalisation 
constant for the prior). Then, proposed swaps between 
chains with models {d\,M\,U\) and (6 2 ,M 2 ,i/ 2 ) and tem- 
peratures T\ and T 2 are accepted with probability 

. ( A q^je^AduV^q^ (8 2 ,M 2 , v 2 ) \ 

a = mm 1, — — — r — — — . (19) 

V q Tl (6i, Mi, Ui)q T2 (6 2 , M 2 , u 2 ) J y ' 

By using this acceptance probability, the set of 
Markov Chains samples from the distribution 
q Tl (0i, Mi, v\)qT 2 (02, M2, u 2 )...q TN (On, M n , u n ), 
where Ti = 1. The output in the lowest temperature 
chain (with T = 1) samples the posterior distribution 
(equation 13), as required. We chose to run our simula- 
tions with 8 temperature levels, with a swap proposed 
every 10 steps. The temperatures used were Tj = l^ 1 ^ 1 
for i — 1,2,..., 8, as this allowed for an effectively flat 
prior at the highest temperature and an acceptance 
rate for the swaps of about 50 per cent. For different 
data sets, different tempering levels may be appropriate, 
but it is unlikely that any large modifications would 
be necessary. The performance of our algorithm is 
less sensitive to the choice of tempering levels than 
in conventional parallel tempering; this is because the 
inference is dominated by the likelihood rather than the 
prior, and we are only softening the prior. Hence, swaps 
of models between different temperature levels are often 
accepted. 

5. RESULTS 

The output from the MCMC run is shown in Figure 3. 
After an initial burn-in period, the distribution of the 
frequency spacing and the number of frequencies settles 
down to the posterior distribution. The inferred value 
of the frequency separation was (24.99±0.06)/iHz, a very 
accurate determination that is consistent with the true 
value of 25 /iHz. The histogram in the bottom right is 
an accumulation of the frequencies present in the models 
encountered throughout the run, and is very useful as 
a summary of the inference about what frequencies are 
present. By zooming in on these peaks, the uncertainty 
in any frequency can be easily measured, if this is of in- 
terest. The height of the peaks in this histogram are not 
particularly meaningful; the confidence that we have in 
the presence of a frequency is given by the fraction of 
the time that it occurs in the sequence of output mod- 
els, whereas the height of these peaks in the histogram is 
probably more dependent on the binning that has been 
used. However, the total number of frequencies present 
within a short range of frequency space is directly pro- 
portional to the posterior probability that a frequency 
exists in this range, so a histogram with wider binning 
would have the heights proportional to the confidence of 
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Fig. 3. — Results from the Markov Chain Monte Carlo run, 
using the simulated dataset. The lowest frequency (200mHz) was 
not detected, as the amplitudes were generated randomly and this 
frequency must have had a very small amplitude. 
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Fig. 4. — The marginal posterior distribution for the tolerance 
parameter rr. The quality of the data is sufficient to provide evi- 
dence that the relation holds to a high degree of accuracy, so a is 
inferred to be quite small, and the peaks in the prior become quite 
sharp. As a result, the spacing between the frequencies is inferred 
very accurately. 



the frequency detections. In this case those probabilities 
are effectively 1. It is interesting to note that, while alias 
peaks exist in the power spectrum (Figure 1), none are 
present in the output of the MCMC run. This is because 
the data are informative enough to conclusively decide 
which peaks are real and which aren't. This is typical 
behaviour, it is rare to find a genuine ambiguity. 

Interestingly, the number of frequencies has been in- 
ferred to be at least 16, but possibly up to about 20 
(the true number was 17). By examining the sample of 
models, it became clear that most of these additional 
frequencies were very close to the actual frequencies - in 
other words, it found that some of the frequencies were 
doublets. We believe this occurs because of our use of 
independent priors for each of the frequencies. For ex- 
ample, in reality, if we knew the frequency spacing and 
the values of a few of the frequencies, we would prefer 
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a new frequency to be added in one of the gaps. How- 
ever, since we have assumed independent priors for the 
frequencies, the program is just as likely to add another 
frequency into the model with the same value as an ex- 
isting frequency. This is a minor flaw in our procedure, 
but it does not cause any large errors to be made, and 
it is worth keeping the independence assumption for its 
convenience. Of course, the data are also consistent with 
there being a few other frequencies of low amplitude, 
or there being very closely spaced frequencies, but the 
Bayesian "Occam's Razor" leads the program to suggest 
M =16 as the most probable solution. 

Of course, this method may also be used when the 
predicted pattern of frequencies is not as simple as our 
evenly spaced example. With better data, it may be 
possible to infer many other parameters of this pattern, 
such as the small separation, and to search for departures 
from regularity. 

6. EFFECT OF STOCHASTIC EXCITATION AND DAMPING 

The simulated dataset described in section 3 is highly 
idealised. One important missing feature, which is an im- 
portant aspect of solar-like oscillations, is the fact that 
the individual modes are not pure sinusoids. Instead, 
they are continually damped and stochastically excited 
by convection. This is one example of how the assump- 
tions of our model may be violated by real data. It is 
important to test the method on some simulated data 
that includes this effect. To explore this, we produced 
a second simulated time series with the same frequency 
pattern, window function and noise level. We used the 
stochastic model described by Stello et al. (2004). The 
input amplitudes followed a gaussian distribution centred 
at 400 //Hz, with a FWHM = 200 //Hz and a maximum 
of 2 ms _1 . The mode lifetime was set to 3 days, in good 
agreement with measurements on the sun and other stars 
(Chaplin et al. 1997; Kjeldsen et al. 2005; Stello et al. 
2006). The results are displayed in Figure 5. Since the 
modes are no longer pure sinusoids, some will need to be 
represented by more than one frequency. This explains 
the increase in the inferred number of frequencies, from 
16 to about 24. In addition, the accuracy with which we 
can determine each frequency is reduced. As a result, 
the inferred value of a is larger (Figure 6), and the re- 
sulting uncertainty of Av is doubled, with the estimate 
of (25.01±0.13)//Hz for the spacing between the modes. 
It is encouraging that our method was still able to obtain 
the a correct result for Av. As a check of the reliability of 
our method, we ran the code on five different realisations 
of stochastically excited and damped oscillations, and the 
inferred value of Av agreed with the true one (to within 
the quoted uncertainty) each time. The fact that the in- 
ferred value of a is larger for the simulations with more 
damping (comparing Figures 4 and 6) suggests that we 
can measure the damping rate using this method. Pre- 
viously, the damping rate has been measured by forward 
modelling to see the effect that damping and reexcitation 
has on the periodogram (Stello et al. 2006). 

7. ECHELLE DIAGRAMS 

A commonly used tool in asteroseismology is the 
echelle diagram, a plot of frequency vs. frequency mod- 
ulo the large separation Av. The output of the MCMC 
program is a sequence of models for the star, allowing a 
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Fig. 5. — Same as Figure 3, but for a simulation in which each 
mode was stochastically excited and damped. More frequencies (~ 
24) are required to explain the data, and the uncertainty in the 
value of the frequency spacing is increased. 
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Fig. 6. — Same as Fig 4 but for a simulation with stochastically 
excited and damped oscillations. The inferred value of a is much 
larger than in Figure 4, and as a consequence the uncertainty about 
the frequency separation is increased. 



sequence of echelle diagrams to be plotted, one for each 
model. The diversity of this sequence of plots would 
then indicate the uncertainty that we have about the true 
echelle diagram of the star. However, it is more conve- 
nient to have a single diagram, with the uncertainties in 
the frequencies plotted as error bars. To construct such a 
diagram, we used the accumulated frequencies from the 
MCMC run (i.e. the frequencies in the histogram at the 
bottom right of Figure 5). The resulting echelle diagram 
is displayed in Figure 7. Each frequency that was en- 
countered is represented by a point in this diagram, and 
the spread for each mode arises due to the uncertainty in 
each frequency, and hence these lines can be interpreted 
as error bars. 

8. CONCLUSIONS 
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Fig. 7. — Bchclle diagram for the stochastically excited datasct. 
The large separation is 50 /iHz, which is twice the frequency spac- 
ing. The frequencies used for this plot are from the accumulated 
set of frequencies over the whole MCMC run, so each mode forms 
an almost horizontal line on this diagram, with the width of the 
line indicating the uncertainty about the frequency of that mode. 



In this paper, we presented a data analysis method for 
solar-like oscillation data, based on Bayesian inference 
and Markov Chain Monte Carlo methods. These meth- 
ods are becoming very popular in astronomy and science 
in general, as they are often able to extract more informa- 
tion from the dataset than more ad-hoc approaches. In 
addition, reliable uncertainties in all inferred quantities 
are easily obtained. 

The method we have presented here, which we have al- 
ready applied to v Ind (Bedding et al. 2006), has several 
advantages over more conventional approaches. For ex- 
ample, the CLEAN algorithm (Roberts, Lehar, & Drchcr 



1987), which is based on iterative sine wave fitting, re- 
quires that the amplitude and phase be fitted as well. 
Then, when this fitted curve is subtracted, biases are in- 
troduced. This is because the fitted parameters are not 
exactly correct. Since our code fits all of the frequen- 
cies simultaneously, it does not suffer from this problem. 
Also, our method is able to take into account impor- 
tant prior information that we have about the expected 
pattern of frequencies, which other methods ignore. It 
could be argued that we aren't really sure that the prior 
information is correct, and therefore taking it into ac- 
count may be giving us overconfident results. However, 
we have placed safeguards in our method to ensure that 
this does not happen - this was the purpose of introduc- 
ing the a parameter and allowing it to be determined 
by the data. Importantly, we have also found that when 
some of the model assumptions are violated (in particu- 
lar, if the modes are damped and stochastically excited), 
the method continues to give useful results. 

Of course, this method is more computationally inten- 
sive than the usual power spectrum based methods 4 . We 
would not recommend use of our method on very large 
datasets, for example those from the sun. In these cases, 
the power spectrum can be computed swiftly and is very 
informative. We believe that our method is best suited 
to those datasets where the data values (and hence the 
power spectrum) are noisy and incomplete, and aliasing 
causes difficulties in the interpretation of the power spec- 
trum. As observational techniques improve, the num- 
ber of target stars will increase and there will always be 
datasets for which this is the case. We hope that the 
approach we have presented here will prove useful in this 
field. 
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4 For our simulated data, usable output was obtained in a 
timescale of several hours on a modern PC. 
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